12 research outputs found

    From-Below Boolean Matrix Factorization Algorithm Based on MDL

    Full text link
    During the past few years Boolean matrix factorization (BMF) has become an important direction in data analysis. The minimum description length principle (MDL) was successfully adapted in BMF for the model order selection. Nevertheless, a BMF algorithm performing good results from the standpoint of standard measures in BMF is missing. In this paper, we propose a novel from-below Boolean matrix factorization algorithm based on formal concept analysis. The algorithm utilizes the MDL principle as a criterion for the factor selection. On various experiments we show that the proposed algorithm outperforms---from different standpoints---existing state-of-the-art BMF algorithms

    From-Below Boolean Matrix Factorization Algorithm Based on MDL

    No full text
    International audienceDuring the past few years Boolean matrix factorization (BMF) has become an important direction in data analysis. The minimum description length principle (MDL) was successfully adapted in BMF for the model order selection. Nevertheless, a BMF algorithm performing good results from the standpoint of standard measures in BMF is missing. In this paper, we propose a novel from-below Boolean matrix factorization algorithm based on formal concept analysis. The algorithm utilizes the MDL principle as a criterion for the factor selection. On various experiments we show that the proposed algorithm outperforms---from different standpoints---existing state-of-the-art BMF algorithms

    Similarity Approach to Defining Basic Level of Concepts Explained from the Utility Viewpoint

    No full text
    In many practical situations, it is necessary to describe an image in words. From the purely logical viewpoint, to describe the same object, we can use concepts of different levels of abstraction: e.g., when the image includes a dog, we can say that it is a dog, or that it is a mammal, or that it is a German Shepherd. In such situations, humans usually select a concept which, to them, in the most natural; this concept is called the basic level concept. However, the notion of a basic level concept is difficult to describe in precise terms; as a result, computer systems for image analysis are not very good in selecting concepts of basic level. At first glance, since the question is how to describe human decisions, we should use notions from a (well-developed) decision theory -- such as the notion of utility. However, in practice, a well-founded utility-based approach to selecting basic level concepts is not as efficient as a heuristic similarity approach. In this paper, we explain this seeming contradiction by showing that the similarity approach can be actually explained in utility terms -- if we use a more accurate description of the utility of different alternatives

    PrF

    No full text
    Abstract. The paper explores a utilization of Boolean factorization as a method for data preprocessing in classification of Boolean data. In previous papers, we demonstrated that data preprocessing consisting in replacing the original Boolean attributes by factors, i.e. new Boolean attributes that are obtained from the original ones by Boolean factorization, improves the quality of classification. The aim of this paper is to explore the question of how the various Boolean factorization methods that were proposed in the literature impact the quality of classification. In particular, we compare three factorization methods, present experimental results, and outline issues for future research. Problem Setting In classification of Boolean data, the objects to classify are described by Boolean (binary, yes-no) attributes. As with the other classification problems, one may be interested in preprocessing of the input attributes to improve the quality of classification. With Boolean input attributes, we might want to limit ourselves to preprocessing with a clear semantics. Namely, as it is known, see e.g

    How to Distinguish True Dependence from Varying Independence?

    No full text
    A usual statistical criterion for the quantities X and Y to be independent is that the corresponding distribution function F (x, y) is equal to the product of the corresponding marginal distribution functions. If this equality is violated, this is usually taken to mean that X and Y are dependent. In practice, however, the inequality may be caused by the fact that we have a mixture of several populations, in each of which X and Y are independent. In this paper, we show how we can distinguish true dependence from such varying independence. This can also lead to new measures to degree of independence and of varying independence. 1 Formulation of the Problem Independence: a usual description (see, e.g., [8]). In statistics, independence between two events A and B means that the probability of the event A is not affected by whether the event B occurs or not. For example, the prob-1 ability of winning a lottery is the same where a person was born in January or not. In precise terms, this means that the conditional probability P (A | B) of A under the condition B is equal to the probability P (A) of the event A: P (A | B) = P (A). P (A & B) Since P (A | B) = , the above equality is equivalent t
    corecore